Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)"#4332
Revert "[wingman -> rllib] IMPALA MultiDiscrete changes (#3967)"#4332ericl merged 1 commit intoray-project:masterfrom
Conversation
|
Test PASSed. |
@robertnishihara is this misdetection due to a revert or is it some other bug? |
|
@ericl This is the new performance testing jenkins, which only kicks in if the PR string contains ray-core afaik (@devin-petersohn), i.e. if something performance-related has changed. ray-core should probably be changed to core. |
|
Ah ok, didn't realize it had a different name. |
| log_rhos = get_log_rhos(target_action_log_probs, | ||
| behaviour_action_log_probs) | ||
|
|
||
| log_rhos = target_action_log_probs - behaviour_action_log_probs |
There was a problem hiding this comment.
One potential source of the bug is if this guy has a wrong shape.
There was a problem hiding this comment.
I checked the shapes here and it looked reasonable (at least they match).
|
Test FAILed. |
…project#3967)" (ray-project#4332)" This reverts commit 3c41cb9.
What do these changes do?
There seems to be a bug here, isolated to the changes in vtrace.py. Reverting that file and patching up multi_from_logits to call from_logits fixes the regression. Unfortunately I don't have time to investigate this further so we should revert this change and re-land it once the bug is found.
Interestingly, vtrace.py still passes its original unit tests.
The regression is fairly straightforward to reproduce: on any GPU machine run:
Within a few minutes it gets to 1m+ timesteps. Before the regression, you can expect 10+ reward at this point. After the regression, the reward will top out at 1-2.
Related issue number
Atari performance regression originally reported in #4329